Performance Optimization of Software Distributed Shared Memory Systems
نویسندگان
چکیده
Software Distributed Shared Memory Systems (DSMs, or Shared Virtual Memory) are advocated to be an ideal vehicle for parallel programming because of its combination of programmability of shared memory and scalability of distributed memory systems. The challenge in building a software DSM system is to achieve good performance over a wide range of parallel programs without requiring programmers to restructure their shared memory parallel programs or less modification to their sequential programs. The overhead of maintaining consistency in software and the high latency of sending messages make this target difficult to achieve. In this dissertation, we try to improve the performance of software DSM system by examining cache coherence protocol, memory organization scheme, system overhead, load balancing, and communication optimization respectively. By analyzing the disadvantages of snoopy and directory-based cache coherence protocol, we propose a lock-based cache coherence protocol for scope consistency. The unique characteristic of this protocol is applying the “home” concept to not only data information but also coherence information. Each coherence information has a static home according to the corresponding synchronization object. As such, the lock-based protocol has least coherence related overheads for ordinary read or write misses. Moreover, the lock-based protocol is free from the overhead of maintaining the directory. Based on this protocol, we designed a simple but efficient software DSM system named JIAJIA. JIAJIA employes home-based but with a novel memory organization scheme through which the overhead of address translation is eliminated and a large shared address space combined by the physical memories of multiple nodes is supported. Based on the detailed analysis about the system overhead of software DSM systems, we propose several optimal techniques, such as read notice, hierarchical barrier implementation, cache only write detection scheme, etc., to reduce system overheads in home-based software DSMs. Performance evaluation results show that the performance can be improved greatly by these techniques. The dissertation proposes an affinity-based self scheduling (ABS) method for loop scheduling. ABS achieves the best performance compared with other schemes proposed in the past in meta-computing environment because of the reduction of synchronization overhead and waiting time resulting from load imbalance, and is comparable with the best scheduling schemes in dedicated environment. For iterative scientific applications, we argue that a task should be a combination of computation subtask and it’s corresponding data subtask firstly. Then we propose a task migration scheme which integrates computation migration and data migration together for achieving better resource utilization in meta-computing environment. The relationship between computation subtask and it’s corresponding data is mined by our run-time system, and data migration is completed by a novel home migration scheme, which is an important characteristic of JIAJIA. To our knowledge, this is the first implemention in home-based software DSMs. Finally, the dissertation designs and implements a user-level communication optimization scheme (JMCL1.0) which is specific to home-based software DSM system on Myrinet. Consequently, the frequency of memory copy of one communication is reduced from 7 to 2. Furthermore, the interface between communication substrate and software DSMs become simpler than that of traditional UDP/IP network protocol.
منابع مشابه
DSMSim: A Distributed Shared Memory Simulator for Clusters of Symmetric Multi-Processors
Distributed shared memory systems have become popular as a means of utilizing clusters of computers for solving large applications. We have developed a high-performance DSM at Wayne State University. To improve the performance of our DSM, we have developed a memory hierarchy simulator that allows us to compare various techniques very quickly and with much less effort. This paper describes our s...
متن کاملPerformance Analysis and Improvement of OpenMP on Software Distributed Shared Memory Systems
In this paper, the performance of the portable OpenMP compiler on SDSM JIAJIA is analyzed using SPEC OMPM2001 benchmark. The overheads of parallel execution have been investigated from the aspects of thread management and task schedule, memory access and synchronization. To improve the performance, the page placement and data privatization techniques have been implemented for the optimization o...
متن کاملLimits to the Performance of Software Shared Memory: A Layered Approach
Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the end performance of applications that were written for the more proven hardware{ coherent shared memory is still not very good on these systems. Three major layers of software (and hardware) stand between the end user and parallel performance, each with i...
متن کاملTowards OpenMP Execution on Software Distributed Shared Memory Systems
In this paper, we examine some of the challenges present in providing support for OpenMP applications on a Software Distributed Shared Memory(DSM) based cluster system. We present detailed measurements of the performance characteristics of realistic OpenMP applications from the SPEC OMP2001 benchmarks. Based on these measurements, we discuss application and system characteristics that impede th...
متن کاملReducing Coherence-Related Communication in Software Distributed Shared Memory Systems
Distributed shared memory (DSM) is an abstraction of shared memory on a distributed memory machine. Hardware DSM systems support this abstraction at the architecture level; software DSM systems support the abstraction within the runtime system. One of the key problems in building an e cient software DSM system is to reduce the amount of communication needed to keep the distributed memories cohe...
متن کاملMechanisms and interfaces for software-extended coherent shared memory
Software-extended systems use a combination of hardware and software to implement shared memory on large-scale multiprocessors. Hardware mechanisms accelerate common-case accesses, while software handles exceptional events. In order to provide fast memory access, this design strategy requires appropriate hardware mechanisms including caches, location-independent addressing, limited directories,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007